31 research outputs found
ESMC: Entire Space Multi-Task Model for Post-Click Conversion Rate via Parameter Constraint
Large-scale online recommender system spreads all over the Internet being in
charge of two basic tasks: Click-Through Rate (CTR) and Post-Click Conversion
Rate (CVR) estimations. However, traditional CVR estimators suffer from
well-known Sample Selection Bias and Data Sparsity issues. Entire space models
were proposed to address the two issues via tracing the decision-making path of
"exposure_click_purchase". Further, some researchers observed that there are
purchase-related behaviors between click and purchase, which can better draw
the user's decision-making intention and improve the recommendation
performance. Thus, the decision-making path has been extended to
"exposure_click_in-shop action_purchase" and can be modeled with conditional
probability approach. Nevertheless, we observe that the chain rule of
conditional probability does not always hold. We report Probability Space
Confusion (PSC) issue and give a derivation of difference between ground-truth
and estimation mathematically. We propose a novel Entire Space Multi-Task Model
for Post-Click Conversion Rate via Parameter Constraint (ESMC) and two
alternatives: Entire Space Multi-Task Model with Siamese Network (ESMS) and
Entire Space Multi-Task Model in Global Domain (ESMG) to address the PSC issue.
Specifically, we handle "exposure_click_in-shop action" and "in-shop
action_purchase" separately in the light of characteristics of in-shop action.
The first path is still treated with conditional probability while the second
one is treated with parameter constraint strategy. Experiments on both offline
and online environments in a large-scale recommendation system illustrate the
superiority of our proposed methods over state-of-the-art models. The
real-world datasets will be released
MicroRec: Efficient Recommendation Inference by Hardware and Data Structure Solutions
Deep neural networks are widely used in personalized recommendation systems.
Unlike regular DNN inference workloads, recommendation inference is
memory-bound due to the many random memory accesses needed to lookup the
embedding tables. The inference is also heavily constrained in terms of latency
because producing a recommendation for a user must be done in about tens of
milliseconds. In this paper, we propose MicroRec, a high-performance inference
engine for recommendation systems. MicroRec accelerates recommendation
inference by (1) redesigning the data structures involved in the embeddings to
reduce the number of lookups needed and (2) taking advantage of the
availability of High-Bandwidth Memory (HBM) in FPGA accelerators to tackle the
latency by enabling parallel lookups. We have implemented the resulting design
on an FPGA board including the embedding lookup step as well as the complete
inference process. Compared to the optimized CPU baseline (16 vCPU,
AVX2-enabled), MicroRec achieves 13.8~14.7x speedup on embedding lookup alone
and 2.5$~5.4x speedup for the entire recommendation inference in terms of
throughput. As for latency, CPU-based engines needs milliseconds for inferring
a recommendation while MicroRec only takes microseconds, a significant
advantage in real-time recommendation systems.Comment: Accepted by MLSys'21 (the 4th Conference on Machine Learning and
Systems
Co-design Hardware and Algorithm for Vector Search
Vector search has emerged as the foundation for large-scale information
retrieval and machine learning systems, with search engines like Google and
Bing processing tens of thousands of queries per second on petabyte-scale
document datasets by evaluating vector similarities between encoded query texts
and web documents. As performance demands for vector search systems surge,
accelerated hardware offers a promising solution in the post-Moore's Law era.
We introduce \textit{FANNS}, an end-to-end and scalable vector search framework
on FPGAs. Given a user-provided recall requirement on a dataset and a hardware
resource budget, \textit{FANNS} automatically co-designs hardware and
algorithm, subsequently generating the corresponding accelerator. The framework
also supports scale-out by incorporating a hardware TCP/IP stack in the
accelerator. \textit{FANNS} attains up to 23.0 and 37.2 speedup
compared to FPGA and CPU baselines, respectively, and demonstrates superior
scalability to GPUs, achieving 5.5 and 7.6 speedup in median
and 95\textsuperscript{th} percentile (P95) latency within an eight-accelerator
configuration. The remarkable performance of \textit{FANNS} lays a robust
groundwork for future FPGA integration in data centers and AI supercomputers.Comment: 11 page
Full connected neural-network for simulation of extantion in self-stressed monolitic slabs on ground
Желткович А. Е., Молош В. В., Пархоц K. Г., Савейко Н. Г., Юань Цзиньбинь, Чжэньхао Цзян, Чжэн Хаоюань. Моделирование перемещений в самонапряженных монолитных плитах на основании при помощи полносвязной нейронной сетиIn this article the strategy of interdisciplinary convergence of mechanics and artificial intelligence is illustrated. The article presents the results of calculating displacements in self-stressed monolithic slabs on ground obtained using a trained fully connected neural network. The empirical results of displacements in slabs on ground, displacements calculated according to the physicomechanical model, and obtained using a neural network are represented. The inspiration brought us to study neural networks modeling biological neural networks are follow: neural networks can autonomously detect patterns hidden in phenomena and can identify parameters on complex behavioral tracks of different physical systems. The authors describe in detail the developed and trained fully connected neural network
Molecular Composition of Oxygenated Organic Molecules and Their Contributions to Organic Aerosol in Beijing
The understanding at a molecular level of ambient secondary organic aerosol (SOA) formation is hampered by poorly constrained formation mechanisms and insufficient analytical methods. Especially in developing countries, SOA related haze is a great concern due to its significant effects on climate and human health. We present simultaneous measurements of gas-phase volatile organic compounds (VOCs), oxygenated organic molecules (OOMs), and particle-phase SOA in Beijing. We show that condensation of the measured OOMs explains 26-39% of the organic aerosol mass growth, with the contribution of OOMs to SOA enhanced during severe haze episodes. Our novel results provide a quantitative molecular connection from anthropogenic emissions to condensable organic oxidation product vapors, their concentration in particle-phase SOA, and ultimately to haze formation.Peer reviewe
A New Oversampling Method Based on the Classification Contribution Degree
Data imbalance is a thorny issue in machine learning. SMOTE is a famous oversampling method of imbalanced learning. However, it has some disadvantages such as sample overlapping, noise interference, and blindness of neighbor selection. In order to address these problems, we present a new oversampling method, OS-CCD, based on a new concept, the classification contribution degree. The classification contribution degree determines the number of synthetic samples generated by SMOTE for each positive sample. OS-CCD follows the spatial distribution characteristics of original samples on the class boundary, as well as avoids oversampling from noisy points. Experiments on twelve benchmark datasets demonstrate that OS-CCD outperforms six classical oversampling methods in terms of accuracy, F1-score, AUC, and ROC
The prognostic significance of the neutrophil-to-lymphocyte ratio and the platelet-to-lymphocyte ratio in giant cell tumor of the extremities
Abstract Background In this study, the influence of the neutrophil-to-lymphocyte ratio (NLR) and the platelet-to-lymphocyte ratio (PLR) on the prognosis of giant cell tumor (GCT) of the extremities were investigated. Methods The clinical parameters of 163 patients who were diagnosed with GCT of the extremities between July 2008 and January 2018 were retrospectively analyzed. Optimal cutoff values of NLR and PLR were determined using receiver operating characteristic (ROC) analysis. According to optimal cutoff values, patients were divided into high NLR and low NLR groups or high PLR and low PLR groups. Kaplan-Meier and log-rank methods were used to compare the recurrence-free survival (RFS) between the high and low NLR groups, and between the high and low PLR groups. Univariate analysis was performed to determine the influence of age, gender, neutrophil count, lymphocyte count, platelet count, white blood cell count, tumor size, surgical approach and Campanacci stage on the prognosis of giant cell tumor of bone. The main predictors of RFS were determined by Cox multivariate regression analysis. Results The optimal cutoff value of NLR in giant cell tumor of the extremities was 2.32, which was used to classify patients into high and low NLR groups. The optimal cutoff value of PLR was 116.81, and was used to classify patients into high and low PLR groups. Campanacci stage, tumor maximum diameter, alkaline phosphatase, and C-reactive protein (CRP) were significantly associated with the high NLR and PLR. Cox multivariate regression analysis revealed that the Campanacci stage (HR = 3.28, 95% CI: 1.24~8.69) and NLR (HR = 4.18, 95% CI: 1.83~9.57) were independent prognostic factors for giant cell tumor of the extremities. Conclusion As a novel inflammatory index, NLR has some predictive power for the prognosis of patients with giant cell tumor of the extremities
Distributed Recommendation Inference on FPGA Clusters
Deep neural networks are widely used in personalized recommendation systems. Such models involve two major components: the memory-bound embedding layer and the computation-bound fully-connected layers. Existing solutions are either slow on both stages or only optimize one of them. To implement recommendation inference efficiently in the context of a real deployment, we design and implement an FPGA cluster optimizing the performance of both stages. To remove the memory bottleneck, we take advantage of the High-Bandwidth Memory (HBM) available on the latest FPGAs for highly concurrent embedding table lookups. To match the required DNN computation throughput, we partition the workload across multiple FPGAs interconnected via a 100 Gbps TCP/IP network. Compared to an optimized CPU baseline (16 vCPU, AVX2-enabled) and a one-node FPGA implementation, our system (four-node version) achieves 28.95x and 7.68x speedup in terms of throughput respectively. The proposed system also guarantees a latency of tens of microseconds per single inference, significantly better than CPU and GPU-based systems which take at least milliseconds
Transient analysis of LP rotor from NPP 900MW turbine
AbstractThermal stress and the contact stress for centrifuge force field during the start up and shut down is very most important for the safety of the turbine which will affect the design life of the turbine. The stress at startup and shut down is much larger than the stress at other conditions. The stress level and the fatigue life are important for safety and economy of the rotor. In this paper, the mechanical properties of the material varying with the temperature are considered. The vapor pressure and temperature at different position of the rotor and at different history are considered to calculate the film coefficient. The two dimensional thermal-mechanical coupled model is used to calculate the transient temperature field and stress field. The three dimensional contact model is used to calculate the stress field and contact stress under the centrifuge loading conditions